-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#DRAFT/WIP 20 allow custom submissions to slurm #21
base: main
Are you sure you want to change the base?
Conversation
Running into this error on slurm due to https://github.com/eic/simulation_campaign_single/blob/main/scripts/run.sh#L87
|
Is this merging https://github.com/eic/job_submission_slurm/ into this? |
It'd be great if that means we can give up on having two repos. |
scripts/submit_csv.sh
Outdated
|
||
|
||
|
||
|
||
condor_submit -verbose -file ${SUBMIT_FILE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore when SYSTEM==slurm?
scripts/submit_csv.sh
Outdated
@@ -65,9 +89,44 @@ sed " | |||
s|%INPUT_FILES%|${INPUT_FILES}|g; | |||
s|%REQUIREMENTS%|${REQUIREMENTS}|g; | |||
s|%CSV_FILE%|${CSV_FILE}|g; | |||
s|%ACCOUNT%|${ACCOUNT:-rrg-wdconinc}|g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to crash this if not specified, rather than have everyone submit under my allocation ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: ${ACCOUNT:?Define ACCOUNT with the slurm account}
, ref: https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion
Yup. That's the idea. |
Also give a familiar interface to submitters as the campaign coordinators. So, users can for example can run custom workflows like this on narval or ifarm without having write privileges to xrootd. CAMPAIGN_OUTPUT=$SHARED/ePIC/ePIC-Campaign-Organizer/Campaign_Output SYSTEM=slurm EBEAM=5 PBEAM=41 DETECTOR_VERSION=main DETECTOR_CONFIG=epic_craterlake JUG_XL_TAG=nightly CSV_FILE=gamma_1GeV_part2.csv ./scripts/submit_csv.sh narval_csv single SINGLE/etaScan/gamma.csv 1 |
@@ -28,12 +28,32 @@ shift | |||
TARGET=${1:-2} | |||
shift | |||
|
|||
# environment variable to indicate whether the job is running on condor or slurm | |||
SYSTEM=${SYSTEM:-condor} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYSTEM
may be a bit generic. How about SCHEDULER
or something like that?
|
||
# create command line | ||
EXECUTABLE="./scripts/run.sh" | ||
ARGUMENTS="${TYPE} EVGEN/\$(file).\$(ext) \$(nevents) \$(ichunk)" | ||
EXECUTABLE="$PWD/scripts/run.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should use something that determines the directory of the current script, e.g. SCRIPTDIR=$(dirname $0)
or so. Then this script can be run from outside the directory, and the files it generates can be better organized.
if [ ${SYSTEM} = "condor" ]; then | ||
ARGUMENTS="${TYPE} EVGEN/\$(file).\$(ext) \$(nevents) \$(ichunk)" | ||
elif [ ${SYSTEM} = "slurm" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if elif else fi can be turned into case statement
elif [ ${SYSTEM} = "slurm" ]; then | ||
ARGUMENTS="${TYPE}" | ||
# FIXME: This is not ideal. It prevents from submitting multiple jobs with different JUG_XL_TAG simultaneously. | ||
cd scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above for SCRIPTDIR
which would be useful here.
Also pushd-popd for directory stacks then.
done | ||
echo "Submitting ${NJOBS} to a ${SYSTEM} system" | ||
|
||
if [ ${SYSTEM} = "condor" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case.
@wdconinc Here is a minimal script that tries to just run the same command on the remote node and surprisingly it doesn't face permissions issues when writing that file. Granted i downloaded eic-shell prior to submitting the job. Really confusing why the main job submission script is failing.
Here is the slurm log file and the job finishes without issues
|
where gamma_1GeV_small.csv contains
And that leads to this log
|
This issue can be resolved here with
but that doesn't actually get us any further on the osg condor side. |
(with that diff it just fails a few lines later, but ok) |
Briefly, what does this PR introduce?
Adapt for custom submissions to narval
What kind of change does this PR introduce?
Please check if this PR fulfills the following:
Does this PR introduce breaking changes? What changes might users need to make to their code?
Does this PR change default behavior?